163

DOI: 10.1201/9781003355205-5

C h a p t e r 5

RNA-Seq Data Analysis

5.1  INTRODUCTION TO RNA-SEQ

RNA sequencing or shortly RNA-Seq is a high-throughput sequencing technique to

­examine gene expression or profiling in biological samples. Rather than the whole genome,

RNA-Seq focuses only on transcriptomes to tell us which one of the genes in the samples

are turned on or off and to what extent. The transcriptome is the set of all messenger RNA

transcribed from genes in cells. Genes represent a small portion of the whole genome of

an organism. For instance, genes are only around 3% of the human genome. RNA, in gen-

eral, can either be translated into proteins or play functional role such as rRNA for protein

synthesis, tRNA for amino acid transfer, and microRNA that plays a role in gene regula-

tion. Out of whole transcriptome, protein-coding RNA (mRNA) is around 2%–4%, while

the greatest percentage of RNA molecules is the other types of RNA. The gene expres-

sion studies mainly focus on mRNA only because it is translated into proteins. Proteins

are required for the structure, function, and regulation of the cells forming body’s tissues

and organs. Dysfunctions of proteins implicate in most of the diseases and healthy condi-

tions. By studying the mRNA, we can find out which genes are active in a particular cell

type or under a certain condition, giving us a clue about the function of the cells and the

biological activities under that condition. Moreover, we can compare the gene activities

in different types of cells or in different conditions and study how the patterns of gene

expression change over time or in response to different stimuli. We can use such informa-

tion to understand biological activities, normal functions of the cells, effects of pathogens

or therapy, or response to any factors. In short, examining mRNA of cells under a specific

condition provides a snapshot or a checkpoint of the cellular activities at that moment.

Therefore, it is mostly studied to investigate how a certain disease like cancer may affect the

gene expression or to study the cellular response to a treatment or a condition.

A gene is a DNA sequence that consists of several functional components; each plays

a different role in the process of gene transcription into RNA. In general, the functional

gene regions can be divided into two main units: the promoter region and the coding

region. The promoter region controls the transcription of the mRNA, whereas coding